Applications of Differentiation
Tracking the Sign of the Derivative
If is a function, then the sign of its derivative, , indicates whether is increasing (), decreasing (), or zero.
If the derivative takes the value 0
at a certain point then the function has maximum, minimum, or a saddle point at .
Details
If is a function, then the sign of its derivative, , indicates whether is increasing (), decreasing (), or zero. can be zero at points where has a maximum, minimum, or a saddle point.
If for , and for then has a maximum at .
If for , and for then has a minimum at .
If for , and for then has a saddle point at .
If for , and for then has a saddle point at .
Examples
If is a function such that its derivative is given by
then applying the above criteria for maxima and minima, we see that has maxima at and and has minima at and .
Describing Extrema Using
with corresponds to a maximum if -
with corresponds to a minimum if
Details
If corresponds to a maximum, then the derivative is decreasing and the second derivative cannot be positive, (i.e. ). In particular, if the second derivative is strictly negative, ( ), then we are assured that the point is indeed a maximum, and not a saddle point.
If corresponds to a minimum, then the derivative is increasing and the second derivative cannot be negative, (i.e. ).
If the second derivative is zero, then the point may be a saddle point, as happens with at .
The Likelihood Function
If is the probability mass function (p.m.f.):
then the joint probability of obtaining a sequence of outcomes from independent sampling is:
Suppose each probability includes some parameter , this is written:
If the experiment gives we can write the probability as a function of the parameters:
This is the likelihood function.
Details
Recall that the probability mass function (p.m.f) is a function giving the probability of outcomes of an experiment.
We typically denote the p.m.f.
by so gives the probability of a given outcome, , of an experiment.
The p.m.f.
commonly depends on some parameter.
We often write
If we take a sample of independent measurements, from , then the joint probability of a given set of numbers is:
Suppose each probability includes the same parameter , then this is typically written:
Now consider the set of outcomes from the experiment. We can now take the probability of this outcome as a function of the parameters.
This is the likelihood function and we often seek to maximize it to estimate the unknown parameters.
Examples
Suppose we toss a biased coin independent times and obtain heads, we know the probability of obtaining heads is:
The parameter of interest is and the likelihood function is:
If is unknown we sometimes wish to maximize this function with respect to in order to estimate the true probability .
Plotting the Likelihood
missing slide -- want to give a numeric example and plot
Examples
missing example -- want to give a numeric example and plot
Maximum Likelihood Estimation
If is a likelihood function for a p.m.f.
, then the value which gives the maximum of :
is the maximum likelihood estimator (MLE) of .
Details
If is a likelihood function for a p.m.f.
, then the value which gives the maximum of :
is the maximum likelihood estimator of .
Examples
If is the number of heads from independent tosses of a coin, the likelihood function is:
Maximizing this is equivalent to maximizing the logarithm of the likelihood, since logarithmic functions are increasing. The log-likelihood can be written as:
To find possible maxima, we need to differentiate this formula and set the derivative to zero:
So:
is the extreme and so we can write:
for the MLE.
Least Squares Estimation
Least squares: Estimate the parameters by minimizing:
Details
Suppose we have a model linking data to parameters. In general we are predicting as ().
In this case it makes sense to estimate parameters by minimizing:
Examples
One may predict numbers, , as a mean, , plus error. Consider the simple model , where is an unknown parameter (constant) and is the error in measurement when obtaining the observations, , .
A natural method to estimate the parameter is to minimize the squared deviations:
It is not hard to see that the that minimizes this is the mean:
One also commonly predicts data with values on a straight line, i.e. with , where are fixed numbers. This leads to the regression problem of finding parameter values for and which gives the best fitting straight line in relation to least squares:
As a general exercise in finding the extreme of a function, let's look at the function where are some constants. We wish to find the that minimizes this sum. We simply differentiate to obtain:
Thus: